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Ecologists often interpret variation in the spatial distribution of 
populations in terms of responses to environmental features, but dis¬ 
entangling the effects of individual variables can be difficult if la¬ 
tent effects and spatial and temporal correlations are not accounted 
for properly. Here, we use hierarchical models based on a Poisson 
log-normal mixture to understand the spatial variation in relative 
abundance (counts per standardized unit of effort) of yellow perch, 

Perea flavescens, the most abundant fish species in Lake Saint Pierre, 

Quebec, Canada. The mixture incorporates spatially varying envi¬ 
ronmental covariates that represent local habitat characteristics, and 
random temporal and spatial effects that capture the effects of unob¬ 
served ecological processes. The sampling design covers the margins 
but not the central region of the lake. We fit spatial generalized linear 
mixed models based on three different prior covariance structures for 
the local latent effects: a single Gaussian process (GP) over the lake, 
a GP over a circle, and independent GP for each shore. The mod¬ 
els allow for independence, isotropy, or nonstationary spatial effects. 

Nonstationarity is dealt with using two different approaches, geo¬ 
metric anisotropy, and the inclusion of covariates in the correlation 
structure of the latent spatial process. The proposed approaches for 
specification of spatial domain and choice of Gaussian process priors 
may prove useful in other applications that involve spatial correlation 
along an irregular contour or in discontinous spatial domains. 


1. Introduction. 

1.1. Ecological motivation. Ecologists often seek to interpret variation in the spatial distribution 
of populations in terms of responses to environmental features. However, population distributions 
are influenced simultaneously by numerous environmental variables which vary in space and time 
and can not always be directly observed. Disentangling the individual effects of these variables 
on populations and, more generally, interpreting environmental effects in ecological analyses, can 
be difficult if the influence of latent (unobserved) variables and spatial and temporal correlations 
are not accounted for properly. Our aim in this study is to understand the spatial variation in 
relative abundance (counts per standardized unit of effort) of yellow perch, Perea flavescens, the 
most abundant fish species in Lake Saint Pierre, Quebec, Canada. To this end, we use a hierar¬ 
chical modelling approach to incorporate spatially varying environmental covariates that represent 
local habitat characteristics, and random spatial and temporal effects to capture the effects of unob¬ 
served ecological processes. Hierarchical modelling has proven to be a powerful tool for dealing with 
spatio-temporal variation and latent effects and attaining improved inference on specific environ- 
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mental effects (reviewed in Wikle, 2003; Clark & Gelfand, 2006; Wikle, 2010). To account for the 
spatial arrangement of the sampling locations and the pronounced heterogeneity of environmental 
influences across the lake, we explore various alternative specifications of the spatial domain and 
the spatial correlation structure. 

1.2. Geo statistical models for counts: a brief overview. Animal counts are often highly variable 
in space and time and show over dispersion relative to the Poisson distribution. Poisson-lognormal 
mixtures naturally incorporate overdispersion (Bulmer, 1974), and have been used to extend the 
conventional geostatistical framework to spatially structured counts: Y( s) is assumed to follow 
a conditional independent Poisson distribution with mean A(s), where logA(s) = X(s)/3 + Z(s), 
X(s) is a p-dimensional row vector of covariates, (3 is the associated parameter column vector, and 
Z( s) is a local random effect (Diggle et ah, 1998). Z(-) is assumed to follow a zero-mean Gaussian 
process, with common variance a 2 and valid correlation function p{ s — s'), where s and s' are 
arbitrary locations in the study region (Diggle et al., 1998). The spatial process Z is stationary if 
the correlation function p(-) is a function of the difference s —s', and is both stationary and isotropic 
under the stronger assumption that p(-) is a function of the Euclidean distance between locations; 
the latter assumption implies that the distribution of Z is invariant under translation and rotation. 
However, the distribution of T(-) may not be stationary and isotropic even if p(-) is a function only 
of the Euclidean distance between locations. The vector defined as A = (A(si), • • • , A(s n ))' follows 
a multivariate log-normal distribution, and the marginal moments of the process Y(-) are obtained 
through the properties of the log-normal distribution and conditional expectations, that is 

E(Y( s)) = exp(/r(s) + ct 2 /2) = cc(s) 

Var{Y{ s)) = a(s) + a 2 (s) [exp(cr 2 ) — l] 

Cov{Y (s), Y (s')) = a(s)a(s') [exp(o- 2 p(s — s')) — l] , 

where fi{s) = X(s)/3. If one of the elements of X(-) varies with location, then the resultant covariance 
function for Y(-) will also vary with location. 

Although the inclusion of spatially varying covariates can yield an anisotropic process for Y(-), 
it may still be useful to consider models allowing a priori for anisotropy in spatial process Z 
(Denison & Mallick, 1998; Williams, 1998). However, the performance of these models would depend 
critically on whether the likelihood contains information to support inference on the parameters 
characterizing the anisotropy (Diggle et al., 1998). Various geostatistical approaches to modelling 
nonstationary continuous data, including the use of covariates in the dependence structure are 
reviewed in Guttorp & Schmidt (2013). Recent applications that use covariates in non-stationary 
dependence structures include Ingebrigtsen et al. (2014) and Poppick & Stein (2014). Inclusion of 
covariates in the dependence structure can be a parsimonious way of capturing the effect of latent 
processes that modify the effective distance between points, such as physical mechanisms that 
facilitate or hinder the transport of materials or energy between points. In the absence of detailed 
subject knowledge of the mechanisms behind spatial connections, the covariates in the dependence 
structure can provide a simple proxy for those mechanisms, and their interpretation may yield 
insight into potential sources of anisotropy in the study system. 

1.3. Sampling design and data collection. 

Study system. Fish counts and environmental measurements were collected between 14 June and 
22 August 2007 in Lake Saint Pierre (46°12’ N; 72°50’ W), a fluvial lake of the Saint Lawrence 
River (Quebec, Canada). Environmental conditions in the lake show strong spatial heterogeneity 
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and temporal variability. The lake is large (surface area: annual mean 315 knr 2 ; 469 knr 2 during the 
spring floods) and shallow (mean water depth 3.17 m; range 1.23 m). The ice-free (April-November) 
surface area of LSP fluctuates between 387 and 501 knr 2 depending on water level (Hudon, 1997). 
Lake Saint Pierre has distinct water masses running along its northern, central, and southern 
portions. These water masses originate from different sources and differ consistently in physical 
and chemical characteristics because lateral mixing is limited. A deep (>14 m) central navigation 
channel runs along the major axis of the lake; the channel has strong current and may act as a 
barrier to fish movement between the north and south shores (Fig. 1). 

Fish relative abundance. Fish were collected from the shallow littoral zone (< 2.5 m depth) by 
electrofishing (Smith-Root CataRaft boat). Counts of yellow perch were obtained for n = 160 
locations equally distributed between the north and south shores of the lake (Fig. 1). Each location 
represents the centroid of a 20-min fishing trajectory measuring approximately 4 m in width and 
650 nr in length and running parallel to the shoreline. The fishing trajectories provided extensive 
coverage of the available littoral habitat along both shorelines. All samples were collected by a single 
team of trained operators. Fish sampling was carefully standardized to reduce variation in sampling 
efficiency among locations: current intensity was always maintained between 6 and 7 amp to control 
for variation in water conductivity; fish were only collected near the water surface to reduce the 
effect of water transparency on visibility; operators were equipped with polarized sunglasses and 
visors to reduce glare and improve visibility; mean depth at all locations (< 1.56 m) allowed for 
coverage by the electric field of the entire water column along the trajectory; sampling was only 
conducted on days with at most moderate breeze (< 4 on the Beaufort scale). This protocol yielded 
a single measurement of relative abundance (counts per 20 min of standardized sampling) for each 
location. To reduce the time and effort required to move between locations, samples were collected 
from a cluster of either four or eight adjacent locations on each of 38 sampling dates (four locations 
on 36 dates; eight locations on two dates). The sampling dates were unevenly spaced in time over 
a period of 70 days, and the north and south shores were visited in alternation on consecutive 
sampling dates. This sampling design yielded measurements that were clustered both in space and 
in time, in contrast with the simultaneous sampling of all locations at all occasions characteristic 
of many spatio-temporal sampling schemes. 

Environmental covariates for the Gaussian process mean structure. In Lake Saint Pierre, suitable 
habitat for yellow perch is concentrated in the shallow littoral zones that border the lake shores. 
Fish counts are expected to respond locally to habitat conditions at the site of capture, which are 
characterized in this study by a set of four environmental covariates measured at each location: 
water depth, transparency, vegetation, and substrate composition. These four covariates together 
with an intercept were included in the mean structure of all models considered here. 

Geodetic covariate for the Gaussian process covariance structure. Two of the models we consider 
include, in addition to the environmental covariates in the mean structure, a covariate in the cor¬ 
relation function, geodetic lake depth, which allows for non-stationarity of the covariance structure 
of the spatial random effects. Including information on this covariate in the covariance structure of 
the spatial process provides a flexible yet relatively simple means of capturing anisotropy along the 
shorelines of the lake. Geodetic lake depth, measured as water depth minus the lake level relative 
to a fixed International Great Lakes low-water datum (IGLD55), was calculated for each location 
and sampling date. Locations at a given geodetic depth he along a common isobath, or equal-depth 
contour along the lake bottom (Fig. 1). Relative abundance at sites distant from each other but 
having similar geodetic depth may be linked by dynamic processes such as fish movements along 
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Figure 1. Study locations along the north and south shores of Lake Saint Pierre (circles). Each location represents 
the centroid of a fishing trajectory approximately 650 m in length. Symbol shading is proportional to geodetic lake 
depth. The central navigation channel is represented by the dotted curve. Water flows from south-west to north-east. 


lake depth contours. Large-scale processes such as movement can generate spatial correlation in 
fish counts, yet their effects may not be adequately captured by the local environmental covariates, 
which are typically selected to reflect habitat preference at smaller scales. When information on 
movement dynamics is not available for inclusion in the mean structure, inclusion of covariates such 
as geodetic depth in the covariance structure may be useful for capturing correlations generated by 
the underlying latent process. 

This paper is organized as follows. Section 2 describes the spatial generalized linear mixed models 
(SGLM) considered in this study. We explore three classes of Gaussian processes for the local latent 
effects: one defined over the whole lake, another defined over a circle, and lastly one that assumes 
independent Gaussian processes for each shore. For the models that assume either a single Gaussian 
process over the lake or independent Gaussian processes for each shore, we also allow for anisotropy 
of the local effects. Anisotropy is represented either geometrically or, alternatively, by use of a 
covariate in the correlation structure of spatial processes. Section 3 describes the inference and 
model selection procedures. Section 4 presents and interprets the results obtained under the fitted 
models, including the potential influence of spatial confounding. Section 5 concludes by discussing 
the advantages of the fitted models for understanding the spatial distribution of yellow perch in 
Lake Saint Pierre. 

2. Modelling environmental covariate, temporal, and spatial effects. We assume that 
observed fish counts are realizations from a Poisson-lognormal mixture. Specifically, T(s) is the 
number of fish observed at location s and 

Y( s) | A(s) ~ Poi( A(s)) 

follows a conditionally independent Poisson distribution. We assume 
(2.1) log(A(s)) = X(s)/3 + 5(s), 

where X(s) is a LT-dimensional row vector containing a value of 1 and the K — 1 environmental 
covariates observed at location s and (3 = (/3o, Pi, • • • , /3k-i)' is the corresponding parameter vector 
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of regression coefficients (i.e., fio is an intercept). The second component in (2.1), £(■), is a mixing 
component that comprises temporal and spatial random effects and allows for over dispersion in the 
Poisson distribution. We decompose this mixing component as the sum of two independent terms, 

( 2 . 2 ) <5(s) = 7 (t(s)) + Z(s ), 

where r j(t(s)) captures temporal effects common to the nt locations sampled on a particular day, 
and £(■) is a deterministic function that assigns to each location s the ordinal rank corresponding 
to the Julian day on which the location was sampled, that is, (t(-) £ l,-- - ,T}, where T = 38 
is the total number of sampling days. The spatial term Z( s) captures local effects that remain 
after accounting for environmental covariate and temporal effects. The simplifying assumption of 
additivity of temporal and spatial terms was required because substantially more replication would 
have been needed to allow for inclusion of time-space interactions. In all, 11 models are considered, 
all of which include the four environmental covariates as well as temporal effects. As well, all models 
but one (a non-spatial“baseline” model; see below) include spatial effects. 

2.1. Modelling the temporal effect 7 (t(-)). Sampling days are unevenly spaced. For each sam¬ 
pling day t we have observations at nt different locations. We initially assumed that the temporal 
effects 7 = ( 7 ( 1 ),-- - , 7 (T))' follow a multivariate zero-mean normal prior distribution and that 
the correlation between components of 7 decay exponentially in time, with Cov('y(t),'y(t 1 )) = 

t 2 exp | J(t) — J(t') |^ , where J(t ) is the Julian day associated with the t-th sampling day, r 2 

represents the variance of 7 , assumed constant across time, and </> 7 > 0 captures time-dependent 
decay in the correlation structure among elements of 7 . Exploration of different prior specifications 
for yielded no evidence of correlation in the components of 7 across time. We therefore retain a 
simpler independence structure in subsequent analyses; all results reported in Section 4 are based 
on an independent normal prior distribution with zero mean and variance r 2 for each 7 (-)- 

2.2. Modelling the local effects Z(-). Let Z = (Z( si),--- ,Z(s n ))' be the n-dimensional vector 
obtained by stacking the latent local effects of all observed locations Z(sj), i = 1, • • • , n. We assume 
that Z is a partial realization from a zero-mean Gaussian process with covariance matrix X. A 
key aspect differentiating the models we explore here is the definition of the covariance matrix 
S. To account for spatial effects arising from unmeasured (latent) ecological processes, such as 
land-water exchanges, barrier effects of the central navigation channel, or behavioural aggregation, 
we examine 10 spatial models representing different combinations of spatial domain and spatial 
correlation structure. For comparison we include as well a baseline model with no spatial effects 
(MO), that is Z( s) = 0,Vs. The spatial domain and correlation structure of the 11 models under 
consideration, labelled MO through M10, are summarized in Table 1. 

2 . 2 . 1 . Specification of spatial domain and choice of Gaussian process priors. 

A single Gaussian process over the lake (models Ml-Mf). As a first approximation, the compo¬ 
nents of Z may be viewed as a partial realization from a random field defined over the whole lake. We 
assume that the local effects Z follow a zero-mean Gaussian process prior such that Z ~ 1V(0, S), 
where X is a n-dimensional covariance matrix, with each element given by X(s,s') = a 2 p( s,s'; (f>), 
where a 2 is the variance of the process and p(s,s';(fi) is a valid correlation function that depends 
on a parameter vector <j>. The hyperparameters to be estimated for this Gaussian process are 
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Table 1 

Spatial domain and correlation structure of the 11 models considered in this study. 


Spatial 

domain 


Correlation structure 



None 

Independence 

Isotropy 

Anisotropy 

Geometric Covariate 

in 

correlation 

Whole lake 

M0 

Ml 

M2 

M3 

M4 

Circular 

By shore 


M7 

M5, M6 

M8 

M9 

M10 


A single Gaussian process over a circle (models M5 and M6). The sampling locations form an 
approximately elliptical arrangement along the north and south shorelines of the lake (Fig. la of 
Section 1 of the supplemental article (Schmidt et ah, 2015)). Correlations between locations may 
therefore be induced by ecological processes associated with depth contours that follow the shoreline, 
such as water flow and fish movements. We examine this possibility by projecting the locations onto 
a unit circle and fitting models that assume a Gaussian process over a circle. Let ui be the angular 
distance between any two points c\ and C 2 on the circle 8. The Euclidean distance between points c\ 
and C 2 (i.e., the chord distance) is £ = 2rsin(w/2). The function p^(ur,4>) = exp j — ^ 2rsin(w/2)| 
is a possible correlation function for a homogeneous random field on the circle § (Yaglom, 1987, 
pp. 387-389). Gneiting (2013) notes that the functions p^(ur,(p) and /Tj(w; (f>) = exp(— co/cp), for 
uj € [0,7r], are both valid, positive definite correlation functions on the circle, and have similar 
behaviour, particularly in the critical neighbourhood of u: = 0. Here, we define an isotropic random 
field on a circle of radius r = 1 and fit the circular models under both correlation functions, p £ and 
p u . The two correlation functions yielded very similar results and therefore only those based on p £ 
are presented in Section 4. The hyperparameters to be estimated for the circular Gaussian process 
are r/ Circ = (a 2 , (p). 

Two different circular projections of the sampling location coordinates are considered: 

1. Model M5: An ellipse is fitted to the UTM location coordinates by orthogonal least-squares 
(Fig. la in Section 1 of the supplemental article (Schmidt et ah, 2015)). The original space 
is then shrunk to yield identical ellipse semi-axes of unit length, and the shrunken location 
coordinates are projected radially (scaling by vector norm) onto the resulting unit circle (Fig. 
lb in Section 1 of the supplemental article (Schmidt et al., 2015)); 

2. Model M6: The centred location coordinates are projected radially (scaling by vector norm) 
onto a unit circle (Fig. lc in Section 1 of the supplemental article (Schmidt et ah, 2015)). 

Note that the assumption of isotropy of the local effects holds only after the projection of the 
original geographical locations onto a circle. 

Separate Gaussian processes for the north and south shores (models M7-M10). The potential 
of the navigation channel to act as a barrier to ecological exchanges between shores, as well as 
previous work which points to marked differences in the spatial structure of yellow perch growth 
between the two shores (Glemet & Rodriguez, 2007) suggest that local effects may have different 
covariance structures in the north and south shores. We therefore explore models that assume 
independent zero-mean Gaussian processes over each shore, each with its own covariance structure: 
Htv(s, s') = ct 2 n p{ s, s'; (J)]\r ) for the north shore and Xls(s, s') = cr| p( s, s'; (pz) for the south shore. 
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2.2.2. Specification of spatial correlation structures: isotropy and nonstationary spatial effects. 
For the isotropic cases we assume an exponential correlation function, p(d, <f) = exp ^||s — s'||^, 

where d = ||s — s'|| denotes Euclidean distance between locations s and s', and (f> > 0 is the 
decay parameter of the correlation function. The hyperparameters to be estimated in the isotropic 
covariance structure are rfi = (a 2 ,cf). 

In the present study water flow, lake morphometry, land-water exchanges, and the presence of 
the central navigation channel may all induce directional effects on the correlations between fish 
counts. We therefore explore correlation functions for the local spatial effects Z, that relax the 
assumption of isotropy. Specifically, we discuss two different approaches, geometric anisotropy, and 
inclusion of covariates in the correlation function. 


Geometric anisotropy (models M3 and M9). In this approach, a stationary covariance structure 
is transformed by differential stretching and rotation of the coordinate axes to capture directional 
effects (Diggle &; Ribeiro, 2007). In the models considered here, the spatial effects Z are correlated 
as a function of a linear transformation of the original coordinate system. We assume S(s,s') = 
o- 2 p(||/(s) - /(s')||,0), where /(s) = sA, and 


1 0 

o r R l V 

where if a €= (0,27r) is the anisotropy angle and i/jr > 1 is the anisotropy ratio (Diggle &; Ribeiro, 
2007). The hyperparameters to be estimated in the geometric anisotropy model are rj G = (a 2 , <f, if a, i(r)- 

Inclusion of covariates in the spatial correlation function (models Mf and M10). A potential 
limitation of geometric anisotropy models is that they rely on a highly symmetrical representation 
that can describe global directional features over the study region, but not local patterned variation. 

An alternative approach, which retains model simplicity while affording additional flexibility, is 
to allow for inclusion of covariates in the correlation structure of Gaussian processes. Following 
Schmidt & Rodriguez (2011a), we include geodetic depth as a covariate in the correlation structure 
of Z to allow for non-stationarity of the resultant covariance structure of the local random effects. 
Specifically, we assume s c = (si, s 2 , q(s 1 , S 2 )) / = (s, q( s ))\ where si and s 2 are easting and northing 
coordinates and g(si, s 2 ) is the observed geodetic depth at location s = (si, S 2 )- The covariate g(.,.) 
is a function of s 1 and s 2 , and so one can think of this spatial process as defined in a two-dimensional 
manifold (Schmidt et al., 2011). We model the elements of S as 


A = 


cos if a — sin if a 
sin if a cos if a 


S(s c ,s' c ) = cr 2 exp|-^- || s — s' || --|-|g(s) - <?(s')| j , 

with (fi,4>2 > 0. This covariance function is non-stationary in the two-dimensional manifold. For 
this model, the hyperparameters to be estimated are rf = (a 2 , <fi, (ff)- 


3. Inference procedure and model comparison. 

Likelihood function. Let y = (y(si),-- - ,y( s n ))' be the observed count at each of the sampling 
locations (Fig. 1) over the sampling period. The vector 6 comprises all the parameters and hyper¬ 
parameters involved in the model. The hyperparameters to be estimated are those related to the 
prior specification for j(.) and Z as discussed above. Conditional on A(sj) each observation is an 
independent realization from a Poisson distribution; therefore, the likelihood function follows the 
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relationship 


n 

/(y I 0) oc JJexp {—A(sj)} [A(sj)] y(si) . 

2—1 

We follow the Bayesian paradigm to obtain estimates of the unknowns in the model. 

Prior specification of the components of 6. We assume that all components of 6 are independent 
a priori. For the components of (3 we assume a normal prior distribution with zero mean and 
some large variance, to represent our lack of information on how each component of X(.) influences 
the mean of the Poisson distribution. For the variances r 2 and a 2 we assume inverse gamma 
distributions with infinite variance and mean based on the residual standard error estimate of a 
log-linear fit. For the decay parameters of the correlation functions (fi) we assign gamma prior 
distributions having unit mean and variance. The components of Z were not expected a priori 
to show very strong spatial correlation, and so the prior specification for sets the range of an 
isotropic spatial process at half or less of the maximum observed distance, with 99% probability. 

For the geometric anisotropy model we assume, a priori, if a ~ U (0, n) for identifiability (to 
ensure that orientations are unique within the interval considered). Prior specifications for the 
anisotropy ratio hyperparameter must satisfy the constraint ipR > 1- We assign a Pareto prior 
distribution to i/jr with scale parameter 1 and shape parameter 2 . This prior has mode 1 (corre¬ 
sponding to the isotropic case), mean 2, and infinite variance. 

Posterior distribution. Following the Bayesian paradigm, the posterior distribution is proportional 
to likelihood times prior. For all models considered above, the posterior distribution does not admit 
a simple closed form. Therefore, we used Markov chain Monte Carlo algorithms, specifically Gibbs 
sampling with some Metropolis-Hastings (M-H) steps (Metropolis et ah, 1953; Hastings, 1970), to 
sample from the posterior. 

We reparametrized the model described in equations (2.1) and (2.2) to build a more efficient 
MCMC sampling scheme. We let log(A(sj)) = X*(sj)/3* + hU(sj), and VF(sj) = /3o+ 7 (t(sj)) + Z(sj), 
where X*(.) does not have a column of ones and (3* = (/%, • • • ,(3k- i)- The covariate coefficients 
do not have full conditional posteriors of closed form, so they were sampled using M-H steps whose 
proposal distribution was based on the algorithm proposed by Gamerman (1997). The hF(sj) were 
sampled using a random walk M-H step with the proposal variance tuned to yield acceptance 
rates approaching 0.44 (Roberts &; Rosenthal, 2009). Let W and Z be the vectors obtained by 
separately stacking the VF(sj) and Z(sj). Z is assumed to follow a zero-mean multivariate normal 
distribution, with covariance matrix X. We can write W = l n /?o + B 7 + Z, which follows an 
n-dimensional multivariate normal distribution, with mean l n /?o + B 7 , and covariance matrix S. 
Here, 7 = ( 7 ( 1 ), • • • ,7 (T))' is the T-dimensional vector comprising the temporal random effects, 
and B is a n x T matrix, where each row is given by the T-dimensional row-vector e* having t th 
column equal to 1 and all other elements equal to zero. Row vector e* enters in B nt times. Under 
this reparametrization, it is easy to sample from the known full conditional posteriors of (3q and 
7 (normal distributions), and of a 2 and r 2 (inverse-gamma distributions). The parameters in the 
spatial correlation function result in unknown full conditional posteriors from which we sampled 
using M-H steps. For all models, we ran the MCMC algorithm (two chains with overdispersed start¬ 
ing values) for 70,000 iterations, used 10,000 iterations as burnin, and stored every 60 t/l iteration. 
The MCMC algorithm was implemented in the Ox programming language, v. 6.20 (Doornik, 2007). 
Convergence was checked using the diagnostics in R package coda (Plummer et al., 2006). 


Model comparison. Four different criteria were used to compare models: (i) deviance informa¬ 
tion criterion (DIC) (Spiegelhalter et al., 2002), (ii) ranked probability score (RPS), (iii) loga¬ 
rithmic score (LS), and (iv) Dawid-Sebastiani score (DSS) (Section 2 of the supplemental article 
(Schmidt et al., 2015)). The last three criteria are proper scoring rules, proposed by Gneiting & Raftery 
(2007) and discussed for discrete observations in Czado et al. (2009). As emphasized by Czado et al. 
(2009), propriety is an essential property of a scoring rule that encourages honest and coher¬ 
ent predictions and ensures that both calibration (consistency between predictions and observa¬ 
tions) and sharpness (concentration of predictive distributions) are being addressed. Following 
GschloBl &; Czado (2008), we use the same data for estimation and computation of the scores, as 
our focus is on understanding the distribution of the counts over the lake rather than prediction. 

For all criteria, the best model among those fitted is that having the smallest value for the criterion. 

4. Results. 

Model comparisons based on DIC, RPS, LogS, and DSS. The baseline model excluding spatial 
effects (M0) had very poor performance relative to all other models (Table 2). Comparisons among 
the models that assume a single Gaussian process over the lake (Ml to M4), and among those 
that allow for different local spatial processes on the north and south shores (M7 to M10), indicate 
that anisotropic models provided better fits than the independent or isotropic models (Table 2). 
Overall, the model that performed best under all four criteria (M9) incorporated both anisotropy 
and a separate Gaussian process for each shore. Posterior predictive checks for M9 (Fig. 2 in 
Section 3 of the supplemental article (Schmidt et al., 2015)) show good agreement between observed 
and fitted values. The importance of considering separate spatial structures for the north and 
south shores is also supported by the fact that M5, which emphasizes the distance between shores 
relatively more than M6, performs better than the latter under all criteria. These results suggest 
that after accounting for the effects of time and of the measured environmental covariates, the 
spatial distribution of yellow perch in Lake Saint Pierre is better modelled with local random 
effects that originate from different processes in the north and south shores. 

Table 2 

Model comparisons based on DIC, RPS, LogS, and DSS. For each criterion, the value associated with the 

best-performing model is given in bold characters. 


Model 

Correlation structure 

D 

PD 

DIC 

RPS 

LogS 

DSS 

M0 

None 

1591.7 

36.6 

1628.2 

6.42 

4.98 

25.2 

Whole lake 

Ml 

Independence 

905.0 

133.6 

1038.6 

2.56 

2.83 

3.59 

M2 

Isotropy 

906.9 

127.2 

1034.1 

2.60 

2.83 

3.58 

M3 

Anisotropy - Geom. 

906.2 

127.2 

1033.4 

2.61 

2.83 

3.58 

M4 

Anisotropy - Cov. in cor. 

902.9 

128.4 

1031.3 

2.57 

2.82 

3.57 

Circular 

M5 

Isotropy 

910.1 

131.0 

1041.0 

2.60 

2.84 

3.60 

M6 

Isotropy 

918.5 

133.1 

1051.6 

2.68 

2.87 

3.63 

By shore 

M7 

Independence 

909.0 

133.4 

1042.4 

2.58 

2.84 

3.59 

M8 

Isotropy 

899.3 

126.5 

1025.8 

2.56 

2.81 

3.54 

M9 

Anisotropy - Geom. 

897.3 

125.1 

1022.4 

2.54 

2.80 

3.54 

M10 

Anisotropy - Cov. in cor. 

900.8 

129.1 

1029.9 

2.55 

2.81 

3.56 


In what follows we focus on results obtained under the four models that assume different local 
structures of the covariance matrix 51 for the north and south shores (M7, M8, M9, and M10). 
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Note that these four models can be viewed as nested: M8 is a particular case of M9 if ip a = 0 and 
ipR = 1, M10 is in turn a particular case of M8 if 4>2 —i► oo, and M7 is a particular case of M8, 
M9, and M10, obtained by setting the appropriate decay parameters <f> ~ 0. For this reason, we 
examine the posterior distributions of the parameters to assess the information gain relative to the 
prior under each of these models, as suggested by Schmidt & Rodriguez (2011b). 

Environmental effects. The prior structure assumed for the latent local effects does not seem to 
influence much the estimates of the effects of the environmental covariates that determine the 
mean structure of the Gaussian process (Fig. 2). The responses to envrionmental covariates in 
models M7 through M10 suggest that the relative abundance of yellow perch in Lake Saint Pierre 
is responding primarily to transparency (positively) and depth (negatively), and not to vegetation 
or substrate. The estimated effects of transparency and depth are very substantial and thus have 
major implicatons for the spatial distribution of yellow perch in the lake. Over the observed ranges 
for these two covariates, these effects imply approximately four-fold (transparency) and eleven-fold 
(depth) changes in the Poisson mean for relative abundance. 
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Figure 2. Posterior summary of regression coefficients (Pi, /?2, /?3, Pi)', for the four environmental covariates (depth, 
transparency, vegetation, and substrate) included in the mean structure of the model. Results are shown for the four 
models that assume different local structures of the covariance matrix E for the north and south shores: Ml, M8, M9, 
and M10. Solid circles: posterior mean of P; vertical lines: 95% credible interval. 


Temporal effects. Posterior distributions of temporal effects y(t(-)) under the independence model 
M7 seem to reflect structure in the data that is not apparent for the models that incorporate spatial 
correlation, M8, M9, and M10, none of which shows evidence of trends or other substantial temporal 
effects (Fig. 3). M8, M9, and M10 show a generalized reduction in the variance of the temporal 
effects relative to M7, as seen in the reduced spread of those temporal effects most distant from 
zero, e.g., for days 5, 6, 27, 33, and 42. This reduction presumably reflects the use of information 
contained in spatial correlations to explain some of the variation attributed to the temporal effects 
in M7. 


M7 M8 M9 M10 



0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 

Days since 13 June 2007 Days since 13 June 2007 Days since 13 June 2007 Days since 13 June 2007 


Figure 3. Posterior summary of the temporal effects 7(t(s *)), for each of the I = 38 sampling dates, under models 
Ml, M8, M9, and M10. Solid circles: posterior mean ofj(.); vertical lines: 95% credible interval. 
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Anisotropy. Estimates of the decay parameters of the exponential correlation functions {(px and 
(j>s) under M8, M9, and M10 suggest independence of the local effects on the south shore (second row 
of Fig. 4). However, the posterior distributions for the anisotropy ratio (i/jr) and anisotropy angle 
{ip a) under M9 provide strong evidence of anisotropy associated with the spatial process on the 
north shore, indicating a slower decay of correlation along the SW - NE direction (third and fourth 
columns of Fig. 4). Strong evidence for anisotropy also emerges when geodetic depth is considered in 
the correlation structure of Z under M10 (fifth and sixth columns of Fig. 4). The presence of spatial 
correlation in the north shore and apparent spatial independence in the south shore is consistent 
with the previous finding that individual growth of yellow perch shows marked spatial heterogeneity 
on the north shore, but spatial homogeneity on the south shore (Glemet & Rodriguez, 2007). 



Figure 4. Posterior distribution (histogram) and prior density (curve) of each of the hyperameters in the respective 
correlation functions of models M8, M9, and M10 (columns), for the north and south shores (rows). 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 

Angular distance (©) 



(a) 


(b) 


Figure 5. (a) Spatial correlation as a function of angular distance from the equator (0). Dashed curve: GAM smooth 
for the covariate-in-correlation model M10 (adjusted for geographical distance; y axis on the right). Dotted curve: GAM 
smooth for the geometric anisotropy model M9 (y axis on the left). Vertical line: posterior mean of the anisotropy 
angle ip a- (b) Anisotropic effects on the north shore. Sampling locations are shown together with 1) direction of lake 
major axis (solid line); 2) ellipse representing the anisotropy angle ip a and ratio ipR from model M9; 3) direction of 
least change in spatial correlation from model M10 (dashed line). 
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Geometric anisotropy can be readily represented graphically, e.g., by means of elliptical contours 
on a map. In contrast, the anisotropy estimated from covariate-in-correlation models can be difficult 
to visualize because it reflects underlying variation in the covariates, which may show little spatial 
pattern. Therefore, we used generalized additive model (GAM; Wood, 2006) analyses of spatial 
correlation as a heuristic device to understand the orientation of anisotropic effects in the covariate- 
in-correlation model M10. We calculated the pairwise geographical distances between all sampling 
locations and the smallest positive angle 0 subtended by the line connecting each pair of locations 
and the equator. A GAM was used to represent the spatial correlation fitted under the covariate- 
in-correlation model M10 as a function of the angle 0 and geographical distance. The results from 
this analysis were used to generate partial smooths showing change in spatial correlation as a 
function of angle 0 after adjustment for geographical distance. For comparison, this analysis was 
also performed for the spatial correlation fitted under the geometric anisotropy model M9. The 
GAM analyses for the covariate-in-correlation model M10 and the geometric anisotropy model M9 
show close agreement and successfully retrieve the direction of least change in correlation specified 
by the fitted anisotropy angle if a (Fig. 5). Both anisotropic models indicate that spatial correlation 
decays most rapidly along a direction approximately perpendicular to the north shoreline and to 
the lake’s major axis. 

Latent spatial effects. To better understand the behaviour of spatial effects on the north and 
south shores, we examined samples from the posterior distribution of the correlation among the 
components of Zjv and Z 5 . Models M 8 , M9, and M10 yield substantially different estimates for 
decay with distance (Fig. 3 top left in Section 3 of the supplemental article (Schmidt et al., 2015)) 
and variability (Fig. 3 top right in Section 3 of the supplemental article (Schmidt et al., 2015)) of 
spatial correlation among the local components on the north shore. Ranges of the posterior 95% 
credible intervals of correlations under the geometric anisotropic model (M9) are much wider than 
those obtained under models M 8 and M10; this may be related to the relatively uninformative prior 
assigned to i/jr. In comparison to M 8 and M9, M10 yields narrower ranges of the posterior 95% 
credible intervals of correlations on the north shore (top right panel of Fig. 3 in Section 3 of the 
supplemental article (Schmidt et al., 2015)). In contrast with the results for the north shore, models 
M 8 , M9, and M10 yield similar estimates of spatial correlation for the south shore, showing rapid 
decay of correlation with distance and providing another indication that the local latent effects are 
not spatially correlated on the south shore. 

For the two shores, substantial spatial variation in relative abundance of yellow perch still remains 
after accounting for temporal and observed environmental effects (Fig. 6 ). This local variation in 
relative abundance may be linked to unmeasured environmental effects, such as changes in optical, 
thermal, or chemical properties arising from the influence of tributaries that enter the lake. Local 
variation may also result from the concentration of mature adult fish at favorable spawning grounds 
during the reproductive season and subsequent downstream movement of groups of young fish 
from the spawning grounds. Larval fish have limited swimming ability and may be transported 
downstream by advection during the first weeks after spawning, until they have grown sufficiently 
to hold position against the flow. Interestingly, the strongest local latent effects appear to be 
concentrated in four areas located downstream of the four known spawning grounds identified in 
Bertolo et al. (2012). 

Spatial confounding. A key objective in many applications involving spatial regression is to esti¬ 
mate the fixed effects while accounting for spatial correlation (Reich et al., 2006; Hughes & Haran, 
2013). Spatial confounding, which arises when the covariates and the spatial effects are not indepen¬ 
dent, can lead to estimates of the posterior mean and variance of fixed effects that markedly differ 
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660 665 670 675 680 

Easting 

Figure 6. Posterior mean of the local latent effects Z for the north and south shores under model M9 (open circles). 
The diameters of the open circles are proportional to e z ^ s '\ i = 1, • • • ,n. Locations corresponding to the 10 greatest 
posterior mean values of 7i are shown as darker (black) circles. The approximate locations of the four known spawning 
areas in the lake are shown as shaded circles. 


from those of the nonspatial regression model (Reich et al., 2006; Paciorek, 2010; Hodges & Reich, 
2010; Hughes & Haran, 2013). The most common approach for dealing with spatial confounding 
is restricted spatial regression (RSR), in which the spatial random effects are constrained to the 
orthogonal complement (residual space) of the fixed effects (Hodges & Reich, 2010; Hanks et al., 
2015). The conditional likelihood functions of the data under SGLM and RSR are identical and 
therefore one can not choose between the two approaches based on in-sample data alone, which can 
be problematic when the interpretation of fixed effects is of interest and the two approaches yield 
different estimates of these effects (Hanks et al., 2015). 

In the geostatistical (continuous spatial support) setting, (Hanks et al., 2015) show that RSR 
provides computational benefits relative to the spatially confounded SGLM, but that Bayesian cred¬ 
ible intervals under RSR can be inappropriately narrow under model misspecification. To mitigate 
this potential problem, they propose a posterior predictive approach (RSR-PPD) which adjusts 
the variance of the regressor coefficients to reflect possible collinearity between fixed and random 
effects. 

To investigate possible spatial confounding between the covariates and the spatial effects Z(-), we 
adapted the restricted spatial regression (RSR) model proposed by Hanks et al. (2015) to deal with 
a Poisson response variable, as detailed in Section 4 of the supplemental article (Schmidt et al., 
2015). As all model comparison criteria pointed to the models with different spatial processes per 
shore as the best among those fitted under the SGLM approach (M8, M9, and M10; Table 2), we 
implemented the RSR and RSR-PPD approaches under the correlation structures corresponding 
to these models. We then compared the posterior summaries of the coefficients of the four envi¬ 
ronmental covariates under the SGLM (/ 3 ), RSR (a), and RSR-PPD (/ 3 ) approaches. Similarly to 
Hanks et al. (2015), the posterior 95% credible intervals of the covariate coefficients under RSR 
were narrower than those obtained under SGLM and RSR-PPD (Fig. 4 in Section 4 of the supple¬ 
mental article (Schmidt et al., 2015)). The intervals for vegetation and substrate overlapped zero 
under SGLM and RSR-PPD but were strictly positive under RSR (Fig. 4 in Section 4 of the sup¬ 
plemental article (Schmidt et al., 2015)). The spatial covariance structure assumed for the spatial 
effect (M8, M9, or M10) did not seem to influence the posterior distribution of the coefficients. 
Interestingly, although the posterior means for the regressor coefficients under RSR and RSR-PPD 


13 




were shifted relative to those under SGLM, substantive interpretation of covariate effects based 
on the 95% credible intervals (i.e., whether intervals overlap zero) was similar under SGLM and 
RSR-PPM (only depth and transparency are important), and this interpretation contrasted with 
that under SRS (all covariates are important). 

Following (Hanks et ah, 2015) we also ran a simulation study to examine the influence of model 
misspecification on the coverage of covariate credible intervals. We generated multiple data sets 
from each of two models, a SGLM and a SRS, and then fit each model to all the data sets. For data 
generated from the SGLM, coverage of the true value of covariate coefficients was adequate for the 
SGLM and the RSR-PPD, but the RSR was unable to recover the true value, with the exception 
of transparency (Fig. 5 in Section 4 of the supplemental article (Schmidt et al., 2015)). Conversely, 
for data generated from the RSR, coverage of the true value of covariate coefficients was similar for 
SGLM, RSR, and RSR-PPD (Fig. 5 in Section 4 of the supplemental article (Schmidt et al., 2015)). 
In agreement with (Hanks et ah, 2015), we found that under model misspecification the covariate 
coefficients resulting from the RSR-PPD and SGLM approaches were conservative, whereas those 
from RSR could be inappropriately narrow. 

Our results on spatial confounding are consistent with the caveat issued by (Hanks et ah, 2015), 
that when the generating mechanism for spatially autocorrelated observations is a spatially missing 
covariate, choosing the RSR over the SGLM assumes that this missing covariate is orthogonal to 
the measured covariates. Because smooth covariates are likely to be collinear, this generally is a 
strong assumption. 

5. Discussion. A hierarchical modelling approach was used to examine the variation in relative 
abundance of a fish species in a lake. We focused on a Poisson-lognormal mixture to model counts 
observed at locations along the shores of the lake over a 70-day period. We examined different 
candidate structures for the lognormal mixing structure, which include a temporal and a spatial 
component. The temporal component accounts for potential effects shared by locations sampled 
on the same day, whereas the spatial component accounts for effects arising from latent ecological 
processes. Environmental effects are incorporated by means of spatially varying covariates that 
reflect local habitat characteristics. In all, we examined 11 models which incorporated the same 
covariates and temporal effects but considered different combinations of spatial domain and spatial 
correlation structure (Table 1). 

The local effects in models M5 and M6 are assumed to follow a Gaussian process over a circle once 
the sampling locations are rescaled by projection onto the unit circle. Gaussian processes defined 
on the circle have been used previously in biological applications, e.g., to describe the spread of an 
airborne plant disease from a point source (Soubeyrand et al., 2008). This approach can be placed 
into the Sampson & Guttorp (1992) framework, in which isotropy holds only after an unknown 
nonlinear transformation of the geographical locations. However, instead of seeking a nonlinear 
transformation to attain isotropy in the deformed space derived from the original configuration, 
we selected the circular transformations a priori based on ecological considerations. Specifically, 
the first projection (M5) reduces intra-shore distances relative to inter-shore distances and thereby 
emphasizes the potential for the central navigation channel to reduce inter-shore correlations, for 
example, by acting as a barrier to fish movement. In comparison, the second projection (M6) more 
closely approximates a scenario in which spatial correlations are determined by distance along 
the shoreline, with little influence of the navigation channel. This projection therefore emphasizes 
longitudinal effects related to water flow, such as contaminant or nutrient gradients along the 
plumes created by tributary streams and rivers entering the lake. The better performance of M5 
relative to M6 points to the importance of inter-shore differences and hints at the operation of 
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different ecological processes in the north and south shores. However, the poor performance of 
M5 and M6 relative to models that treat shores separately (Table 2) suggests that the circular 
projection framework was not sufficiently flexible to capture differences between shores. 

For the nested models M7-M10, which treat shores separately, comparisons of the prior-posterior 
information gain for the key parameters that differentiate the models (Fig. 4) were a useful com¬ 
plement to the model comparison criteria when assessing the relative merits of the models. The 
inclusion of information on geodetic lake depth as a covariate in the covariance structure of the spa¬ 
tial process (M10) provided a flexible means of capturing anisotropy along the shorelines of the lake. 
However, the geometric anisotropy (M9) and covariate-in-correlation (M10) models yielded similar 
substantive results, presumably because the shape of the lake is compact and the arrangement 
of depth contours along both shores is regular. Spatial correlations induced by shoreline-related 
processes therefore have a simple structure that was captured adequately by either model. 

Estimates of environmental effects are very similar for models M7-M10, showing little sensitivity 
to assumptions about the spatial term of the mixing component in those models. Among the envi¬ 
ronmental covariates considered, only water transparency and depth seem to influence the spatial 
distribution of yellow perch. The effects of transparency and depth on relative abundance of yellow 
perch are strong, which points to these water depth and transparency as potential determinants of 
the spatial distribution of this species in the lake. This finding agrees with earlier biological studies 
of local habitat preferences in yellow perch, which are reported to be most abundant in shallow, 
open waters of clear lakes with moderate vegetation and relatively fine substrate (silt to gravel) 
(Scott & Crossman, 1973). However, we find no effect of vegetation and substrate in the present 
study. Vegetation, transparency, depth, and substratum covary naturally in lakes; therefore, their 
effects on the distribution of yellow perch may have been confounded in previous studies that did 
not consider covariates, latent spatial effects, and temporal effects simultaneously. 

The model that performed best (M9) incorporated both anisotropy and a separate Gaussian 
process for each shore. The model comparisons point to marked differences in the posterior structure 
of latent spatial effects for the two shores: anisotropy was conspicuous in the north shore, whereas 
spatial structuring was weak in the south shore. The rate of decay in spatial correlation in the 
north shore had marked directionality and was generally slower in the SW - NE direction, broadly 
in alignment with the shoreline (Fig. 5b). Likely environmental candidates responsible for these 
intershore differences in latent spatial effects include known differences in physical and chemical 
properties of water masses between the two shores, which are influenced by contributions from 
different tributaries and diffuse sources of nutrients and pollutants. Exposure to more differentiated 
water masses in the north shore has been invoked as an explanation for the greater variability in 
growth of yellow perch on the north shore of the lake (Glemet & Rodriguez, 2007). The importance 
of local effects indicates that traditional covariates such as water depth and transparency provide 
only a partial picture of environmental influences on the spatial distribution of yellow perch in the 
lake. Further research on the effects of shoreline-dependent processes and larval transport from the 
spawning grounds may therefore prove fruitful. 

Although fish sampling was carefully standardized, we cannot rule out the possibility that sam¬ 
pling efficiency and thus, probability of detection, were influenced by environmental characteris¬ 
tics, which could confound inference on environmental effects. Identifying environmental effects 
on abundance separately from those on detection in an open population of unmarked individuals 
would require more complex sampling designs and observational models, such as IV-mixture models 
(Royle, 2004). However, these models invoke an assumption of closure over repeated observations 
that may be difficult to justify when using active gear to sample populations of highly mobile fish. 
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The approaches discussed here might pose computational challenges when the number of lo¬ 
cations is large. In such cases, alternative approaches based on Gaussian random Markov fields 
(Lindgren et al., 2011) may be useful to provide a prior distribution for Z. 

An important feature of the present study is its treatment of non-standard spatial features of the 
example, including the spatial arrangement of samples along the lake shoreline and the presence of 
the channel running through the lake. The circular models are expected to be most applicable in sit¬ 
uations in which the spatial domain is approximately oval, but more flexible anisotropic models, such 
as M10, may perform better under more complex domain topologies. Concerns about the influence 
of irregular domain shapes and complex boundaries on the outcome of spatial analyses have been 
voiced previously (Legates, 1991; Ramsay, 2002; Soubeyrand et al., 2008; Miller & Wood, 2014). 
Recently developed methods, such as the soap film smoother (Wood et al., 2008) and generalized 
distance splines (Miller &; Wood, 2014), are promising alternative approaches for dealing effectively 
with complex irregular boundaries or interior holes. However, little effort has been devoted to de¬ 
veloping approaches that systematically compare competing representations of the spatial domain. 
The approaches presented here for specification of spatial domain and choice of Gaussian process 
priors may prove useful in other applications that involve spatial correlation along regular, possibly 
discontinous, contours. Biological examples include samples collected along a mountain perimeter 
within an altitudinal range, at different heights on the bark of a tree trunk, along the edges of 
growing structures (e.g., bacterial colonies, diffusing chemicals), and at interfaces between habitats 
(ecotones). 
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SUPPLEMENTARY MATERIAL 

Additional results for “Population counts along elliptical habitat contours: hierarchi¬ 
cal modelling using Poisson-lognormal mixtures with nonstationary spatial structure” 

(doi: COMPLETED BY THE TYPESETTER; .pdf). This supplement contains four sections which 
provide further results on: 1) circular transformations, 2) model comparison criteria, 3) analyses of 
model fit and correlation of local effects, and 4) restricted spatial regression. 
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